102 research outputs found

    Learning to Predict Novel Noun-Noun Compounds

    Get PDF
    We introduce temporally and contextually-aware models for the novel task of predicting unseen but plausible concepts, as conveyed by noun-noun compounds in a time-stamped corpus. We train compositional models on observed compounds, more specifically the composed distributed representations of their constituents across a time-stamped corpus, while giving it corrupted instances (where head or modifier are replaced by a random constituent) as negative evidence. The model captures generalisations over this data and learns what combinations give rise to plausible compounds and which ones do not. After training, we query the model for the plausibility of automatically generated novel combinations and verify whether the classifications are accurate. For our best model, we find that in around 85% of the cases, the novel compounds generated are attested in previously unseen data. An additional estimated 5% are plausible despite not being attested in the recent corpus, based on judgments from independent human raters.Comment: 9 pages, 3 figures, To appear at Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019) at ACL 2019. V3 - Fixed some typos and updated the Data Preprocessing sectio

    Can language models learn analogical reasoning? Investigating training objectives and comparisons to human performance

    Full text link
    While analogies are a common way to evaluate word embeddings in NLP, it is also of interest to investigate whether or not analogical reasoning is a task in itself that can be learned. In this paper, we test several ways to learn basic analogical reasoning, specifically focusing on analogies that are more typical of what is used to evaluate analogical reasoning in humans than those in commonly used NLP benchmarks. Our experiments find that models are able to learn analogical reasoning, even with a small amount of data. We additionally compare our models to a dataset with a human baseline, and find that after training, models approach human performance

    Measuring the Compositionality of Noun-Noun Compounds over Time

    Get PDF
    We present work in progress on the temporal progression of compositionality in noun-noun compounds. Previous work has proposed computational methods for determining the compositionality of compounds. These methods try to automatically determine how transparent the meaning of the compound as a whole is with respect to the meaning of its parts. We hypothesize that such a property might change over time. We use the time-stamped Google Books corpus for our diachronic investigations, and first examine whether the vector-based semantic spaces extracted from this corpus are able to predict compositionality ratings, despite their inherent limitations. We find that using temporal information helps predicting the ratings, although correlation with the ratings is lower than reported for other corpora. Finally, we show changes in compositionality over time for a selection of compounds.Comment: 6 pages, 3 figures, To appear in the proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change 2019 @ ACL 2019, Fixed typos, Increased figure size

    The Scope and the Sources of Variation in Verbal Predicates in English and French

    Get PDF
    Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), 199-210. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15891

    Analysis of Data Augmentation Methods for Low-Resource Maltese ASR

    Full text link
    Recent years have seen an increased interest in the computational speech processing of Maltese, but resources remain sparse. In this paper, we consider data augmentation techniques for improving speech recognition for low-resource languages, focusing on Maltese as a test case. We consider three different types of data augmentation: unsupervised training, multilingual training and the use of synthesized speech as training data. The goal is to determine which of these techniques, or combination of them, is the most effective to improve speech recognition for languages where the starting point is a small corpus of approximately 7 hours of transcribed speech. Our results show that combining the data augmentation techniques studied here lead us to an absolute WER improvement of 15% without the use of a language model.Comment: 12 page

    Annotating for Hate Speech: The MaNeCo Corpus and Some Input from Critical Discourse Analysis

    Get PDF
    This paper presents a novel scheme for the annotation of hate speech in corpora of Web 2.0 commentary. The proposed scheme is motivated by the critical analysis of posts made in reaction to news reports on the Mediterranean migration crisis and LGBTIQ+ matters in Malta, which was conducted under the auspices of the EU-funded C.O.N.T.A.C.T. project. Based on the realization that hate speech is not a clear-cut category to begin with, appears to belong to a continuum of discriminatory discourse and is often realized through the use of indirect linguistic means, it is argued that annotation schemes for its detection should refrain from directly including the label 'hate speech,' as different annotators might have different thresholds as to what constitutes hate speech and what not. In view of this, we suggest a multi-layer annotation scheme, which is pilot-tested against a binary +/- hate speech classification and appears to yield higher inter-annotator agreement. Motivating the postulation of our scheme, we then present the MaNeCo corpus on which it will eventually be used; a substantial corpus of on-line newspaper comments spanning 10 years.Comment: 10 pages, 1 table. Appears in Proceedings of the 12th edition of the Language Resources and Evaluation Conference (LREC'20

    Graduate skills and employability: Focus interviews with selected job market stakeholders. UPSKILLS Intellectual output 1.5

    Get PDF
    open9siThe final stage of the UPSKILLS needs analysis involved focus interviews with job market stakeholders. This report presents the method used to conduct and analyse the interviews we carried out with twelve job market stakeholders, the main findings of this UPSKILLS task, a discussion of how these findings relate to the results obtained in the previous steps of the needs analysis, and the aims of the UPSKILLS partnership more generally.openAssimakopoulos, Stavros; Vella, Michela; van der Plas, Lonneke; Milicevic Petrovic, Maja; Samardžić, Tanja; van der Lek, Iulianna; Bernardini, Silvia; Ferraresi, Adriano; Pallottino, MargheritaAssimakopoulos, Stavros; Vella, Michela; van der Plas, Lonneke; Milicevic Petrovic, Maja; Samardžić, Tanja; van der Lek, Iulianna; Bernardini, Silvia; Ferraresi, Adriano; Pallottino, Margherit
    corecore